1 Objectives

This notebook aims at

2 Life tables data (ETL)

We investigate life tables describing countries from Western Europe (France, Great Britain –actually England and Wales–, Italy, the Netherlands, Spain, and Sweden) and the United States.

We load the one-year lifetables for female, male and whole population for the different countries.

The meaning of the different columns:

mx: Central death rate between ages x and x+n where n=1, 4, 5, or ∞ (open age interval)

qx: Probability of death between ages x and x+n

ax: Average length of survival between ages x and x+n for persons dying in the interval

lx: Number of survivors at exact age x, assuming l(0) = 100,000

dx: Number of deaths between ages x and x+n

ex:: Life expectancy at exact age x (in years)

But some of the columns need retyping:


Column Name Column Type
Year integer
Age integer
mx double
qx double
ax double
lx integer
dx integer
Lx integer
Tx integer
ex double
Country factor
Gender factor

Coercion introduces a subtantial number of NA warnings. Preliminary inspection of the data suggests that coercion problems orginate from column Age: 110+ cannot be coerced to an integer value. We discard corresponding rows using tidyr::drop_na(Age).

3 Western countries in 1948

We notice that the death rates for new borns are much higher for Italy and Spain than for the rest of the european countries and for the USA. This difference is still noticable for infant mortality. But for the adults, the death rates are pretty much the same for all countries. The difference for young people’s mortality could be explained by the different economic and health conditions at that time between the different countries.

We can see that the ratio between central death rate in Netherlands and central death rate in the USA is less than 1, which means that the central death rate in Netherlands is lower than central death rate in the USA in 1948. But we can also see that this ratio is greater than 1 for almost all the other European countries, which means that the central death rate in the majority of the European countries is higher than the central death rate in the USA in 1948, especially for Italy. This difference could be explained by the health conditions and the financial situation of the two continents at that time.

4 Death rates evolution since WW II

We notice that the mortality quotients of young people in 1946 is smaller in the USA than in all the European countries. This is certainly due to the fact that the USA didn’t suffer a lot from human loss during the WWII, unlike the European countries.

We modify our dataframe so it has the following schema:

Column Name Column Type
Year integer
Age integer
mx double
mx.ref_year double
Country factor
Gender factor

where (Country, Year, Age, Gender) serves as a primary key, mx denotes the central death rate at Age for Year and Gender in Country whereas mx_ref_year denotes central death rate at Age for argument reference_year in Country for Gender.

But as we did since the beginning, we concentrate on the comparison between te USA and the Netherlands.

In the USA, the ratio of mortality rates between all the years after 1946 and the year 1946 has always been under 1 for all ages, which meens that since 1946 people die less in the USA than in 1946. Whereas in the Netherlands, this ratio has been higher than in the USA for all years and especially for the older ages. We also notice a difference for the new borns between the two countries : the ratio is twice higher for the USA in 1956 than in the Netherlands, which means that the mortality rate in 1946 was much higher in the Netherlands compared to the other years, whereas the difference is smaller for the USA between 1946 and the other years. The ratio becomes higher in the USA than in the Netherlands for the age 25 since 2006.

6 Rearrangement

The resulting schema looks like:

Column Name Type
Country factor
Gender factor
Year integer
0 double
1 double
2 double
3 double
\(\vdots\) \(\vdots\)

7 Life expectancy

\[ ex = \sum_{} \prod_{} 1-mx \]

8 PCA and SVD over log-mortality tables

Our scree plot displays how much variation each principal component captures from the data. Since our scree plot is a steep curve that bends quickly and flattens out, the first two PCs are sufficient to describe the essence of the data. So we can say that PCA works well on our data.

We see on the correlation circle that the infant mortality is inversely correlated with life expectancy. Indeed, all the advanced ages are tending down whereas the younger ages are tending up, on the left side of the circle. And the mx arrow is going to the right side of the circle. But we have to consider the fact that the oldest ages reprensent a small percentage of the total population.

We see that the recent years are more distributed on the right side of the biplot, which means that they follow the direction of mx on the correlation circle. So the PCA allows us to conclude that the life expectancy is getting higher as time goes by

9 Canonical Correlation Analysis

##   [1] 0   1   2   3   4   5   6   7   8   9   10  11  12  13  14  15  16  17 
##  [19] 18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35 
##  [37] 36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53 
##  [55] 54  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71 
##  [73] 72  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89 
##  [91] 90  91  92  93  94  95  96  97  98  99  100 101 102 103 104 105 106 107
## [109] 108 109
## <0 rows> (or 0-length row.names)

##      A1 Moisture Management      Use Manure
## 1   2.8        1         SF Haypastu      4
## 2   3.5        1         BF Haypastu      2
## 3   4.3        2         SF Haypastu      4
## 4   4.2        2         SF Haypastu      4
## 5   6.3        1         HF Hayfield      2
## 6   4.3        1         HF Haypastu      2
## 7   2.8        1         HF  Pasture      3
## 8   4.2        5         HF  Pasture      3
## 9   3.7        4         HF Hayfield      1
## 10  3.3        2         BF Hayfield      1
## 11  3.5        1         BF  Pasture      1
## 12  5.8        4         SF Haypastu      2
## 13  6.0        5         SF Haypastu      3
## 14  9.3        5         NM  Pasture      0
## 15 11.5        5         NM Haypastu      0
## 16  5.7        5         SF  Pasture      3
## 17  4.0        2         NM Hayfield      0
## 18  4.6        1         NM Hayfield      0
## 19  3.7        5         NM Hayfield      0
## 20  3.5        5         NM Hayfield      0

10 Lee-Carter model for US mortality

During the last century, in the USA and in western Europe, central death rates at all ages have exhibited a general decreasing trend. This decreasing trend has not always been homogeneous across ages.

The Lee-Carter model has been designed to model and forecast the evolution of the log-central death rates for the United States during the XXth century.

Let \(A_{x,t}\) denote the log central death rate at age \(x\) during year \(t\in T\) for a given population (defined by Gender and Country).

The Lee-Carter model assumes that observed loagrithmic central death rates are sampled according to the following model \[ A_{x,t} \sim_{\text{independent}} a_x + b_x \kappa_t + \epsilon_{x,t} \] where \((a_x)_x, (b_x)_x\) and \((\kappa_t)_t\) are unknown vectors that satisfy \[ a_x = \frac{1}{|T|}\sum_{t \in T} A_{x,t}\qquad \sum_{t\in T} \kappa_t = 0 \qquad \sum_{x} b_x^2 =1 \] and \(\epsilon_{x,t}\) are i.i.d Gaussian random variables.

10.1 US data

  • Fit a Lee-Carter model on the American data (for Male and Female data) training on years 1933 up to 1995.

  • Compare the fit provided by the Lee-Carter model with the fit provided by a rank \(2\) truncated SVD
  • Compare vectors avec \((a_x)_x, (b_x)_x\) and \((\kappa_t)_t\) with appropriate singular vectors.
  • Use the Lee-Carter model to predict the central death rates for years \(2000\) up to \(2015\)
  • Plot predictions and observations for years \(2000, 2005, 2010, 2015\)

10.2 Application of Lee-Carter model to a European Country

  • Fit a Lee-Carter model to a European country

  • Comment
  • Compare with rank-2 truncated SVD
  • Use the Lee-Carter model to predict the central death rates for years \(2000\) up to \(2015\) Plot predictions and observations for years \(2000, 2005, 2010, 2015\)

10.3 Predictions of life expectancies at different ages

  • Use Lee-Carter approximation to approximate residual life expectations

  • Compare with observed residual life expectations

11 References

Life tables and demography

Graphics and reporting

Tidyverse

PCA, SVD, CCA